How to use the tinyForestR package

What is the tinyForestR package?

The aim of tinyForestR is to provide set of tools designed to extract, manipulate and analyse data relevant to the location of Tiny Forests in the UK.

Specifically it extracts and processes landcover and biodiversity data from a range of sources for a given area around Tiny Forest locations, and provides a set of tools for analysing citizen science data derived directly from Tiny Forests.

Tiny Forests are small scale dense tree plantings usually in urban areas - and are managed by Earthwatch. https://earthwatch.org.uk/get-involved/tiny-forest

Getting started

The package is hosted on Github and is a work in progress. It can be installed by running devtools::install_github("julianflowers/tinyForestR"). It loads another Github package myScrapers which is needed to download data from the Tiny Forest website.

The package makes use of a number of Application Programming Interfaces (APIs) some of which require API keys which will need to be applied for separately. This is outlined in the relevant sections of this vignette.

It uses a range of Python packages to access some datasets (in some cases Python packages are better developed than R). For this reason the first step is to run initialise_tf() to intialise the package.

This:

Install and initialise

First install and load the package.


if(!require("tinyForestR"))
devtools::install_github("julianflowers/tiny-forest/tinyfoRest", force = FALSE)
devtools::install_github("julianflowers/myScrapers")
needs(tinyfoRest, sf, myScrapers, rvest)

The initialise tinyForestR

initialise_tf()
#> virtualenv: tinyforest
#> Using virtual environment 'tinyforest' ...

Load Tiny Forest data

The next step is to load Tiny Forest (TF) data. Because this only exists in a series of web pages the get_tf_data function identifies the relevant pages and iterates over them to extract name, id, location, area, planters, and types of tree planted (as a list column), for those TFs planted at the time of extraction. It does include TFs planted outside the UK. The function takes about 120 seconds to iterate over all the relevant pages.

get_tf_data does 4 things:


tf_data <- get_tf_data()
#> 135.002 sec elapsed

Producing a simple map is as simple as loading tf_data$map.


tf_data$map

Or we can draw a more sophisticated static map using tf_data$tidy_tf_sf.


## load UK boundary files
uk <- read_sf("https://services1.arcgis.com/ESMARspQHYMw9BZ9/arcgis/rest/services/Countries_December_2022_UK_BGC/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson")

ggplot() +
  geom_sf(data = uk, fill = "white") +
  geom_sf(data = tf_data$tidy_tf_sf, pch = 1, aes(colour = factor(year(plant_date)))) +
  labs(title = "Tiny Forest locations 2024",
      subtitle = paste("N = ", nrow(tf_data$tidy_tf_sf))) +
  theme_void() +
  scale_colour_viridis_d(option = "turbo")

It is simple to draw time series plots…


df <- tf_data$tidy_tf_sf

df |>
  count(year = year(plant_date)) |>
  ggplot() +
  geom_col(aes(year, n)) +
  geom_label(aes(year, n, label = n)) +
  labs(title = "TFs planted per year") +
  theme_minimal() +
  theme(plot.title.position = "plot")

Once the data is loaded we can save it as a csv file and get some high level information on planting, timings, size and so on.


needs(patchwork)

## annual planting

tf_trees <- df |>
  mutate(trees_1 = str_split(trees, "\\|")) |>
  select(-trees) |>
  unnest("trees_1") 

tf_trees |>
  group_by(tf_id) |>
  summarise(n = n()) |>
  ggplot(aes(n)) +
  geom_histogram() +
  labs(title = "Distribution of tree species", 
       x = "Number of tree species per TF") +
  theme_minimal() +
  theme(plot.title.position = "plot")

Tree species

We can also look at planting frequency for different tree species.


tf_trees |>
  ungroup() |>
  count(trees_1, sort = TRUE) |>
  top_n(25, n) |>
  ggplot() +
  geom_col(aes(n, reorder(trees_1, n))) +
  labs(y = "", 
       title = "Most commonly planted tree species", 
       x = "Number of Tiny Forests") +
 # ggthemes::theme_base() +
  theme(plot.title.position = "plot") +
  theme_minimal() +
  theme(plot.title.position = "plot")

What do TFs look like?

We can also download an image for a TF via Google Maps. This requires an API key which can be obtained as follows:

https://developers.google.com/maps/documentation/streetview/cloud-setup

library(ggmap);library(magick)

tf_df <- tf_data$tidy_tf

api_key <- Sys.getenv("GOOGLE_MAPS")

image <- tinyForestR::get_tf_images(lon = tf_df$lon[100], lat = tf_df$lat[100], key = api_key, zoom = 20, tf_id = tf_df$tf_id[100])

image_read(image )

Biodiversity data

The get_nbn_buffer downloads occurrence data from the NBN Atlas in a set buffer around a given longitude and latitude. For example we can download 30000 records around (Darlington TF). Note: when extracting data from APIs it helps to create a “safe” version of your function espacially if you aare making multiple API calls (see below). The makes sure that your function will keep on running. This is achieved by using the safely function which stores the output as a list with 2 variables - result and error. To get the results we need to extract the result vector.

safe_buff <- safely(tinyfoRest::get_nbn_buffer, otherwise = NA_real_)

nbn_data <- safe_buff(tf_df$lon[100], tf_df$lat[100], n = 30000, tf_id = tf_df$tf_id[100])
#> 1.435 sec elapsed
nbn_out <- nbn_data$result |>
  mutate( tf_id = tf_df$tf_id[100])

nbn_out |>
  filter(year > 2014) |>
  st_as_sf(coords = c("decimalLongitude", "decimalLatitude"), crs = 4326) |>
  ggplot() +
  geom_sf(aes(colour = phylum))

For ecological analyses we need to transform the data from long to wide formats. With an NBN atlas extract we can deploy the create_species_matrix function we creates a species matrix and a list of sites or observation ids.

For example we can extract a matrix of all insect observations…


needs(tinyForestR)

## create a matrix of observations by class and year
spp_matrix <- create_species_matrix(nbn_out, class = "Insecta")


tf_bd_insects <- calc_bd_metrics(df = nbn_out, class = "Insecta")
tf_bd_birds <- calc_bd_metrics(df = nbn_out, class = "Aves")
tf_bd_mammals <- calc_bd_metrics(df = nbn_out, class = "Mammalia")

tf_bd_mammals$metrics
#>   month richness  N     ratio diversity
#> 1    03        1  5 0.2000000 0.0000000
#> 2    04        1  6 0.1666667 0.0000000
#> 3    05        6 10 0.6000000 0.7800000
#> 4    06        5 15 0.3333333 0.7822222
#> 5    08        5  6 0.8333333 0.7777778
#> 6    09        1  4 0.2500000 0.0000000
#> 7    10        1  2 0.5000000 0.0000000
#> 8  <NA>        1  2 0.5000000 0.0000000

Scaling up

To compare biodiversity across TF areas we can extract NBN data from multiple sites using the purrr package. Set .progress = TRUE to track the time it takes to run. Note: the NBN atlas API has a 500000 download limit, but it sometimes a bit temperamental.


## sample tf_ids
## 

sample <- sample_n(tf_df, 20)
output <- map(1:nrow(sample), \(x) safe_buff(lon = sample$lon[x], lat = sample$lat[x], tf_id = sample$tf_id[x], n = 30000), .progress = TRUE)
#> 0.154 sec elapsed
#> 5.448 sec elapsed
#> 3.593 sec elapsed
#> 9.58 sec elapsed
#> 1.997 sec elapsed
#> 2.462 sec elapsed
#> 5.09 sec elapsed
#> 5.501 sec elapsed
#> 4.279 sec elapsed
#> 17.907 sec elapsed
#> 4.786 sec elapsed
#> 19.061 sec elapsed
#> 1.062 sec elapsed
#> 1.117 sec elapsed
#> 11.114 sec elapsed
#> 0.259 sec elapsed
#> 3.96 sec elapsed
#> 1.631 sec elapsed
#> 5.553 sec elapsed
#> 4.237 sec elapsed

res <- map(output, "result")

res_1 <- res[which(!is.na(res))]

res_df <- list_rbind(res_1)

Explore data

How many observations per site per year?

We have 189205 rows of data for 0 sites.


sites <- res_df[["site"]] |> unique()

res_counts <- res_df |> 
  filter(between(year, 2010, 2019)) |>
  count(site, year) 

res_counts |>
  ggplot() +
  geom_line(aes(year, n, group = site), colour = "grey90") +
  geom_line(aes(year, n, group = site), colour = "red", data = res_counts |> filter(site == sites[2])) +
  labs( y. = "No of observations", 
        caption = "Source: NNB API", 
        x = "Year",
    title = "Annual NBN observation counts per TF site", 
       subtitle = paste0("Site ", res_counts |> filter(site == sites[2]) |> select(site) |> slice(1),  " shown as red line")) +
  theme_minimal() +
  theme(plot.title.position = "plot") +
  scale_x_continuous(n.breaks = 10) 

Split by taxa


res_counts_site <- res_df |> 
  filter(between(year, 2010, 2019), 
         site == sites[2]) |>
  count(site, classs, year) |>
  filter(n > 9) |>
  ggplot(aes(factor(year), n)) +
  geom_col() +
  facet_wrap(~ classs, scales = "free") +
  scale_x_discrete()

res_counts_site

|# label: biodiversity-data-large
|# eval: false

spp_matrix_large <- create_species_matrix(res_df, class = "Insecta")

spp_matrix_large

## convert to presence matrix
## 

spp_pres <- spp_matrix_large$sp |>
  mutate_if(is.numeric, \(x) ifelse(x == 0, 0, 1))
  

Explore data

For some taxa alternative data sources may add additional value. For example data from the 2020 Botanic Society of Britain and Ireland survey may give a more complete plant record for each 1km grid square (see below); and eBIrd data may be more upto data for bird observations.

eBIrd data can be obtained by creating and account on the eBIrd website and completing a request form specifying location and requirements. Once approved the data is made available as a download.

Plant diversity using BSBI data

I have also included functions to extract data for the 2020 Botanic Society of Britain and Ireland survey. This is publicly available from for UK National Grid 1k monads. This requires conversion of lat-longs to UK grids.

grid_ref <- tinyForestR::os_lat_lon_to_grid(lon = sample$lon[2], lat = sample$lat[2])
#> Using virtual environment 'tinyforest' ...

grid_ref$grid
#> [1] "SK5606"
bsbi_data <- tinyForestR::get_bsbi_data(grid_ref = grid_ref$grid)

bsbi_data <- bsbi_data |>
  enframe() |>
  unnest("value") |>
  unnest("value") |>
  mutate(year = str_extract(value, "20\\d{2}"),
         value = str_remove(value, year),
         count = parse_number(value),
         value = str_remove(value, as.character(count)), 
         value = str_remove(value, "\\d{1,}"), 
         grid = grid_ref$grid, 
         tf_id = sample$site[2], 
         species = str_remove(value, "\\(") ,
         species = str_remove(species, "\\)"), 
         species = str_squish(species)) |>
  arrange(value)  |>
  drop_na()

Rapid calculation of biodiversity metrics

The calc_bd_metrics function takes an output from get_nbn_buffer or get_bsbi_data, converts the data from long to wide format, creates a species matrix for a specified class (for get_nbn_buffer data), and outputs a list containing:

metrics <- calc_bd_metrics(df = nbn_data$result, class = "Aves")


metrics$metrics |>
  gt::gt()
month richness N ratio diversity
01 15 15 1.0000000 0.9333333
03 8 8 1.0000000 0.8750000
05 37 105 0.3523810 0.9668934
06 36 74 0.4864865 0.9667641
08 7 7 1.0000000 0.8571429
10 1 1 1.0000000 0.0000000
11 1 1 1.0000000 0.0000000
metrics$plot +
  labs(title = "Monthly species richness for ",
         subtitle = paste("Aves", tf_df$site[1]), 
       y = "Richness", 
       x = "Month") +
  theme(plot.title.position = "plot")

Vegetation indices

The package includes a calc_ndvi_buff function to enable the calculation of normalized vegetation index (NDVI) for the buffer area around a given point. It uses Sentinel2 surface reflectance satellite images which are available at 10m resolution and are regularly updated. The function extracts images via the Google Earth Engine API and requires registration and authentication prior to use (see…).

The function returns a list including, image dates, NDVI statistics for the image, an interactive map and a raster. Note it may take few minutes to run.

The code chunk below calculates the NDVI for each image containing the buffer around a TF for 2019 and 2022 and maps them side-by-side. (Note, the function selects only those S2 images with cloud cover < 10%).

plant_date <- sample$plant_date |>
  unique()

ee <- import("ee")
geemap <- import("geemap")


ee$Initialize()

i <- 10

after <- get_vegetation_indices(tf_id = sites[i], buffer = 30, lon = sample$lon[i], lat = sample$lat[i], start = as.character(sample$plant_date[i]), end = as.character(today() ))
#> Generating URL ...
#> Downloading data from https://earthengine.googleapis.com/v1/projects/earthengine-legacy/thumbnails/913fc84489280432b0e0b7c0fb25d432-4a4fc643476eba97e2ce07f839807db9:getPixels
#> Please wait ...
#> Data downloaded to /var/folders/bk/jrqs03tx5mq9s28mhml5xzhm0000gn/T/RtmpEnexwO/s2.tif


before <- get_vegetation_indices(tf_id = sites[i], buffer = 30, lon = sample$lon[i], lat = sample$lat[i], end = as.character(sample$plant_date[i]), start = as.character(sample$plant_date[i] - 365))
#> Generating URL ...
#> Downloading data from https://earthengine.googleapis.com/v1/projects/earthengine-legacy/thumbnails/e31c8c6d6f32586f6ef09e805e321a11-04eb74115a28c8845c093ad93a83b326:getPixels
#> Please wait ...
#> Data downloaded to /var/folders/bk/jrqs03tx5mq9s28mhml5xzhm0000gn/T/RtmpEnexwO/s2.tif


before$sf |>
  ggplot() +
  geom_density(aes(GLI)) +
  geom_density(aes(GLI), data = after$sf, colour = "green") +
  labs(title = paste(sample$plant_date[i], sites[i])) +
  theme_minimal()